153 research outputs found

    BioHEL: Bioinformatics-oriented Hierarchical Evolutionary Learning

    Get PDF
    This technical report briefly describes our recent work in the iterative rule learning approach (IRL) of evolutionary learning/genetics-based machine learning. This approach was initiated by the SIA system. A more recent example is HIDER. Our approach integrates some of the main characteristics of GAssist, a system belonging to the Pittsburgh approach of Evolutionary Learning, into the general framework of IRL. Our aims in developing this system are use all the good characteristics of GAssist but at the same time overcome some of the scalability limitations that it presents

    An efficient decision rule-based system for the protein residue-residue contact prediction

    Get PDF
    Protein structure prediction remains one of the most important challenges in molecular biology. Contact maps have been extensively used as a simplified representation of protein structures. In this work, we propose a multi-objective evolutionary approach for contact map prediction. The proposed method bases the prediction on a set of physico-chemical prop erties and structural features of the amino acids, as well as evolutionary information in the form of an amino acid position specific scoring matrix (PSSM). The proposed technique produces a set of decision rules that identify contacts between amino acids. Results obtained by our approach are presented and confirm the validity of our proposal.Junta de Andalucía P07-TIC-02611Ministerio de Educación y Ciencia TIN2011-28956-C02-0

    Towards low-carbon conferencing : acceptance of virtual conferencing solutions and other sustainability measures in the ALIFE community

    Get PDF
    The latest report from the Intergovernmental Panel on Climate Change (IPCC) estimated that humanity has a time window of about 12 years in order to prevent anthropogenic climate change of catastrophic magnitude. Green house gas emission from air travel, which is currently rising, is possibly one of the factors that can be most readily reduced. Within this context, we advocate for the re-design of academic conferences in order to decrease their environmental footprint. Today, virtual technologies hold the promise to substitute many forms of physical interactions and increasingly make their way into conferences to reduce the number of travelling delegates. Here, we present the results of a survey in which we gathered the opinion on this topic of academics worldwide. Results suggest there is ample room for challenging the (dangerous) business-as-usual inertia of scientific lifestyle

    The intersection of evolutionary computation and explainable AI.

    Get PDF
    In the past decade, Explainable Artificial Intelligence (XAI) has attracted a great interest in the research community, motivated by the need for explanations in critical AI applications. Some recent advances in XAI are based on Evolutionary Computation (EC) techniques, such as Genetic Programming. We call this trend EC for XAI. We argue that the full potential of EC methods has not been fully exploited yet in XAI, and call the community for future efforts in this field. Likewise, we find that there is a growing concern in EC regarding the explanation of population-based methods, i.e., their search process and outcomes. While some attempts have been done in this direction (although, in most cases, those are not explicitly put in the context of XAI), we believe that there are still several research opportunities and open research questions that, in principle, may promote a safer and broader adoption of EC in real-world applications. We call this trend XAI within EC. In this position paper, we briefly overview the main results in the two above trends, and suggest that the EC community may play a major role in the achievement of XAI

    Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets

    Get PDF
    Association rule mining is a well-known methodology to discover significant and apparently hidden relations among attributes in a subspace of instances from datasets. Genetic algorithms have been extensively used to find interesting association rules. However, the rule-matching task of such techniques usually requires high computational and memory requirements. The use of efficient computational techniques has become a task of the utmost importance due to the high volume of generated data nowadays. Hence, this paper aims at improving the scalability of quantitative association rule mining techniques based on genetic algorithms to handle large-scale datasets without quality loss in the results obtained. For this purpose, a new representation of the individuals, new genetic operators and a windowing-based learning scheme are proposed to achieve successfully such challenging task. Specifically, the proposed techniques are integrated into the multi-objective evolutionary algorithm named QARGA-M to assess their performances. Both the standard version and the enhanced one of QARGA-M have been tested in several datasets that present different number of attributes and instances. Furthermore, the proposed methodologies have been integrated into other existing techniques based in genetic algorithms to discover quantitative association rules. The comparative analysis performed shows significant improvements of QARGA-M and other existing genetic algorithms in terms of computational costs without losing quality in the results when the proposed techniques are applied.Ministerio de Ciencia y Tecnología TIN2011- 28956-C02-02Junta de Andalucía TIC-7528Junta de Andalucía P12-TIC-1728Universidad Pablo de Olavide APPB81309

    Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features

    Get PDF
    Motivation: The prediction of a protein’s contact map has become in recent years, a crucial stepping stone for the prediction of the com-plete 3D structure of a protein. In this article, we describe a method-ology for this problem that was shown to be successful in CASP8 and CASP9. The methodology is based on (i) the fusion of the prediction of a variety of structural aspects of protein residues, (ii) an ensemble strategy used to facilitate the training process and (iii) a rule-based machine learning system from which we can extract human-readable explanations of the predictor and derive useful information about the contact map representation. Results: The main part of the evaluation is the comparison against the sequence-based contact prediction methods from CASP9, where our method presented the best rank in five out of the six evaluated met-rics. We also assess the impact of the size of the ensemble used in our predictor to show the trade-off between performance and training time of our method. Finally, we also study the rule sets generated by our machine learning system. From this analysis, we are able to estimate the contribution of the attributes in our representation and how these interact to derive contact prediction

    Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology

    Get PDF
    Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes
    corecore